Speaker Verification using Lasso based Sparse Total Variability Supervector and Probabilistic Linear Discriminant Analysis

نویسندگان

  • Ming Li
  • Charley Lu
  • Anne Wang
  • Shrikanth Narayanan
چکیده

In this paper, we propose a Lasso based framework to generate the sparse total variability supervectors (s-vectors). Rather than the factor analysis framework, which uses a low dimensional Eigenvoice subspace to represent the mean supervector, the proposed Lasso approach utilizes the l norm regularized least square estimation to project the mean supervector on a pre-defined dictionary. The number of samples in this dictionary is appreciably larger than the typical Eigenvoice rank but the l norm of the Lasso solution vector is constrained. Only a small number of samples in the dictionary are selected for representing the mean supervector, and most of the dictionary coefficients in the Lasso solution are 0. We denote these sparse dictionary coefficient vectors in the Lasso solutions as the s-vectors and model them using probabilistic linear discriminant analysis (PLDA) for speaker verification. The proposed approach generates comparable results to the conventional cosine distance scoring based i-vector system and improvement is achieved by fusing the proposed method with either the i-vector system or the joint factor analysis (JFA) system. Experiments results are reported on the female part of the NIST SRE 2010 task with common condition 5 using equal error rate (EER), norm old minDCF and norm new minDCF values. The norm new minDCF cost was reduced by 7.5% and 9.6% relative when fusing the proposed approach with the baseline JFA and i-vector systems, respectively. Similarly, 8.3% and 10.7% relative norm old minDCF cost reduction was observed in the fusion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification

State-of-the-art speaker verification systems are based on the total variability model to compactly represent the acoustic space. However, short duration utterances only contain limited phonetic content, potentially resulting in an incomplete representation being captured by the total variability model thus leading to poor speaker verification performance. In this paper, a technique to incorpor...

متن کامل

PLDA in the I-Supervector Space for Text-Independent Speaker Verification

In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) variability. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on an i-vector, dimension reduction i...

متن کامل

PLDA Modeling in I-Vector and Supervector Space for Speaker Verification

In this paper, we advocate the use of uncompressed form of ivector. We employ the probabilistic linear discriminant analysis (PLDA) to handle speaker and session variability for speaker verification task. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on i-vector, dimension reduction is performed twice – ...

متن کامل

Subspace-constrained supervector PLDA for speaker verification

In this paper, we consider speaker supervectors as observed variables and model them with a supervector probabilistic linear discriminant analysis model (SV-PLDA). By constraining the speaker and channel variability to lie in a common low-dimensional subspace, the model parameters and verification log likelihood ratios (LLR) can be computed in this lowdimensional subspace. Unlike the standard i...

متن کامل

i-vector Based Speaker Recognition on Short Utterances

Robust speaker verification on short utterances remains a key consideration when deploying automatic speaker recognition, as many real world applications often have access to only limited duration speech data. This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced. Results are presented which provi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011